Model Selection

CLIP visual encoding

# CLIP visual encoding

Resnet101 Clip Gap.openai

ResNet101 image encoder based on CLIP framework, extracting image features through Global Average Pooling (GAP)

Image Classification

Resnet50x64 Clip Gap.openai

CLIP model image encoder based on ResNet50 architecture with 64x width expansion, using Global Average Pooling (GAP) strategy

Image Classification

Resnet50x16 Clip Gap.openai

A ResNet50x16 variant model based on the CLIP framework, focused on image feature extraction

Image Classification

Resnet50x4 Clip Gap.openai

ResNet50x4 variant model based on the CLIP framework, designed for image feature extraction

Image Classification

Vit Large Patch14 Clip 224.dfn2b

A vision transformer model based on the CLIP architecture, focused on image feature extraction, released by Apple.

Image Classification

Vit Huge Patch14 Clip 224.dfn5b

A ViT-Huge image encoder based on the CLIP architecture, released by Apple as part of the DFN5B-CLIP model, suitable for visual feature extraction tasks.

Image Classification

Vit Base Patch16 Clip 224.dfn2b

Vision Transformer model based on CLIP architecture, featuring DFN2B-CLIP image encoder weights released by Apple

Image Classification

Vit Huge Patch14 Clip 224.laion2b

ViT-Huge visual encoder based on the CLIP framework, trained on the laion2B dataset, supports image feature extraction

Image Classification

Vit Base Patch32 Clip 256.datacompxl

Vision Transformer model based on CLIP architecture, specialized in image feature extraction with support for 256x256 resolution input

Image Classification

Vit Base Patch32 Clip 224.laion2b

Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained on the laion2B dataset

Image Classification

Vit Base Patch32 Clip 224.datacompxl

Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained using the DataComp XL dataset

Image Classification

Vit Base Patch16 Clip 224.datacompxl

A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, using ViT-B/16 structure and trained on the DataComp XL dataset

Image Classification

Convnext Base.clip Laiona

ConvNeXt Base model based on the CLIP framework, trained on the LAION-Aesthetic dataset, suitable for image feature extraction tasks.

Image Classification

Git Base One Piece

A vision-language model fine-tuned from Microsoft's git-base model, specifically designed to generate descriptive text captions for images from the anime 'One Piece'

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase